Overview
Brought to you by YData
Dataset statistics
| Number of variables | 7 |
|---|---|
| Number of observations | 1338 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 1 |
| Duplicate rows (%) | 0.1% |
| Total size in memory | 286.5 KiB |
| Average record size in memory | 219.3 B |
Variable types
| Numeric | 4 |
|---|---|
| Categorical | 2 |
| Boolean | 1 |
| Dataset has 1 (0.1%) duplicate rows | Duplicates |
age is highly overall correlated with charges | High correlation |
charges is highly overall correlated with age and 1 other fields | High correlation |
smoker is highly overall correlated with charges | High correlation |
children has 574 (42.9%) zeros | Zeros |
Reproduction
| Analysis started | 2025-09-05 05:52:19.582048 |
|---|---|
| Analysis finished | 2025-09-05 05:52:21.270892 |
| Duration | 1.69 second |
| Software version | ydata-profiling vv4.16.1 |
| Download configuration | config.json |
Variables
age
Real number (ℝ)
High correlation 
| Distinct | 47 |
|---|---|
| Distinct (%) | 3.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 39.207025 |
| Minimum | 18 |
|---|---|
| Maximum | 64 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 10.6 KiB |
Quantile statistics
| Minimum | 18 |
|---|---|
| 5-th percentile | 18 |
| Q1 | 27 |
| median | 39 |
| Q3 | 51 |
| 95-th percentile | 62 |
| Maximum | 64 |
| Range | 46 |
| Interquartile range (IQR) | 24 |
Descriptive statistics
| Standard deviation | 14.04996 |
|---|---|
| Coefficient of variation (CV) | 0.35835313 |
| Kurtosis | -1.2450877 |
| Mean | 39.207025 |
| Median Absolute Deviation (MAD) | 12 |
| Skewness | 0.055672516 |
| Sum | 52459 |
| Variance | 197.40139 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=47)
| Value | Count | Frequency (%) |
| 18 | 69 | 5.2% |
| 19 | 68 | 5.1% |
| 50 | 29 | 2.2% |
| 51 | 29 | 2.2% |
| 47 | 29 | 2.2% |
| 46 | 29 | 2.2% |
| 45 | 29 | 2.2% |
| 20 | 29 | 2.2% |
| 48 | 29 | 2.2% |
| 52 | 29 | 2.2% |
| Other values (37) | 969 |
| Value | Count | Frequency (%) |
| 18 | 69 | |
| 19 | 68 | |
| 20 | 29 | |
| 21 | 28 | |
| 22 | 28 | |
| 23 | 28 | |
| 24 | 28 | |
| 25 | 28 | |
| 26 | 28 | |
| 27 | 28 |
| Value | Count | Frequency (%) |
| 64 | 22 | |
| 63 | 23 | |
| 62 | 23 | |
| 61 | 23 | |
| 60 | 23 | |
| 59 | 25 | |
| 58 | 25 | |
| 57 | 26 | |
| 56 | 26 | |
| 55 | 26 |
Length
| Max length | 6 |
|---|---|
| Median length | 4 |
| Mean length | 4.9895366 |
| Min length | 4 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | female |
|---|---|
| 2nd row | male |
| 3rd row | male |
| 4th row | male |
| 5th row | male |
Common Values
| Value | Count | Frequency (%) |
| male | 676 | |
| female | 662 |
Length
Histogram of lengths of the category
Common Values (Plot)
| Value | Count | Frequency (%) |
| male | 676 | |
| female | 662 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 2000 | |
| m | 1338 | |
| a | 1338 | |
| l | 1338 | |
| f | 662 | 9.9% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 6676 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 2000 | |
| m | 1338 | |
| a | 1338 | |
| l | 1338 | |
| f | 662 | 9.9% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 6676 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 2000 | |
| m | 1338 | |
| a | 1338 | |
| l | 1338 | |
| f | 662 | 9.9% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 6676 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 2000 | |
| m | 1338 | |
| a | 1338 | |
| l | 1338 | |
| f | 662 | 9.9% |
bmi
Real number (ℝ)
| Distinct | 548 |
|---|---|
| Distinct (%) | 41.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 30.663397 |
| Minimum | 15.96 |
|---|---|
| Maximum | 53.13 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 10.6 KiB |
Quantile statistics
| Minimum | 15.96 |
|---|---|
| 5-th percentile | 21.256 |
| Q1 | 26.29625 |
| median | 30.4 |
| Q3 | 34.69375 |
| 95-th percentile | 41.106 |
| Maximum | 53.13 |
| Range | 37.17 |
| Interquartile range (IQR) | 8.3975 |
Descriptive statistics
| Standard deviation | 6.0981869 |
|---|---|
| Coefficient of variation (CV) | 0.19887513 |
| Kurtosis | -0.050731531 |
| Mean | 30.663397 |
| Median Absolute Deviation (MAD) | 4.18 |
| Skewness | 0.28404711 |
| Sum | 41027.625 |
| Variance | 37.187884 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 32.3 | 13 | 1.0% |
| 28.31 | 9 | 0.7% |
| 30.495 | 8 | 0.6% |
| 30.875 | 8 | 0.6% |
| 31.35 | 8 | 0.6% |
| 30.8 | 8 | 0.6% |
| 34.1 | 8 | 0.6% |
| 28.88 | 8 | 0.6% |
| 33.33 | 7 | 0.5% |
| 35.2 | 7 | 0.5% |
| Other values (538) | 1254 |
| Value | Count | Frequency (%) |
| 15.96 | 1 | 0.1% |
| 16.815 | 2 | |
| 17.195 | 1 | 0.1% |
| 17.29 | 3 | |
| 17.385 | 1 | 0.1% |
| 17.4 | 1 | 0.1% |
| 17.48 | 1 | 0.1% |
| 17.67 | 1 | 0.1% |
| 17.765 | 1 | 0.1% |
| 17.8 | 1 | 0.1% |
| Value | Count | Frequency (%) |
| 53.13 | 1 | |
| 52.58 | 1 | |
| 50.38 | 1 | |
| 49.06 | 1 | |
| 48.07 | 1 | |
| 47.74 | 1 | |
| 47.6 | 1 | |
| 47.52 | 1 | |
| 47.41 | 1 | |
| 46.75 | 1 |
children
Real number (ℝ)
Zeros 
| Distinct | 6 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.0949178 |
| Minimum | 0 |
|---|---|
| Maximum | 5 |
| Zeros | 574 |
| Zeros (%) | 42.9% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 10.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 1 |
| Q3 | 2 |
| 95-th percentile | 3 |
| Maximum | 5 |
| Range | 5 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 1.2054927 |
|---|---|
| Coefficient of variation (CV) | 1.1009893 |
| Kurtosis | 0.20245415 |
| Mean | 1.0949178 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 0.93838044 |
| Sum | 1465 |
| Variance | 1.4532127 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=6)
| Value | Count | Frequency (%) |
| 0 | 574 | |
| 1 | 324 | |
| 2 | 240 | |
| 3 | 157 | 11.7% |
| 4 | 25 | 1.9% |
| 5 | 18 | 1.3% |
| Value | Count | Frequency (%) |
| 0 | 574 | |
| 1 | 324 | |
| 2 | 240 | |
| 3 | 157 | 11.7% |
| 4 | 25 | 1.9% |
| 5 | 18 | 1.3% |
| Value | Count | Frequency (%) |
| 5 | 18 | 1.3% |
| 4 | 25 | 1.9% |
| 3 | 157 | 11.7% |
| 2 | 240 | |
| 1 | 324 | |
| 0 | 574 |
smoker
Boolean
High correlation 
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 1.4 KiB |
| False | |
|---|---|
| True |
| Value | Count | Frequency (%) |
| False | 1064 | |
| True | 274 | 20.5% |
region
Categorical
| Distinct | 4 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 86.4 KiB |
| southeast | |
|---|---|
| southwest | |
| northwest | |
| northeast |
Length
| Max length | 9 |
|---|---|
| Median length | 9 |
| Mean length | 9 |
| Min length | 9 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | southwest |
|---|---|
| 2nd row | southeast |
| 3rd row | southeast |
| 4th row | northwest |
| 5th row | northwest |
Common Values
| Value | Count | Frequency (%) |
| southeast | 364 | |
| southwest | 325 | |
| northwest | 325 | |
| northeast | 324 |
Length
Histogram of lengths of the category
Common Values (Plot)
| Value | Count | Frequency (%) |
| southeast | 364 | |
| southwest | 325 | |
| northwest | 325 | |
| northeast | 324 |
Most occurring characters
| Value | Count | Frequency (%) |
| t | 2676 | |
| s | 2027 | |
| o | 1338 | |
| h | 1338 | |
| e | 1338 | |
| u | 689 | 5.7% |
| a | 688 | 5.7% |
| w | 650 | 5.4% |
| n | 649 | 5.4% |
| r | 649 | 5.4% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 12042 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| t | 2676 | |
| s | 2027 | |
| o | 1338 | |
| h | 1338 | |
| e | 1338 | |
| u | 689 | 5.7% |
| a | 688 | 5.7% |
| w | 650 | 5.4% |
| n | 649 | 5.4% |
| r | 649 | 5.4% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 12042 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| t | 2676 | |
| s | 2027 | |
| o | 1338 | |
| h | 1338 | |
| e | 1338 | |
| u | 689 | 5.7% |
| a | 688 | 5.7% |
| w | 650 | 5.4% |
| n | 649 | 5.4% |
| r | 649 | 5.4% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 12042 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| t | 2676 | |
| s | 2027 | |
| o | 1338 | |
| h | 1338 | |
| e | 1338 | |
| u | 689 | 5.7% |
| a | 688 | 5.7% |
| w | 650 | 5.4% |
| n | 649 | 5.4% |
| r | 649 | 5.4% |
charges
Real number (ℝ)
High correlation 
| Distinct | 1337 |
|---|---|
| Distinct (%) | 99.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 13270.422 |
| Minimum | 1121.8739 |
|---|---|
| Maximum | 63770.428 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 10.6 KiB |
Quantile statistics
| Minimum | 1121.8739 |
|---|---|
| 5-th percentile | 1757.7534 |
| Q1 | 4740.2872 |
| median | 9382.033 |
| Q3 | 16639.913 |
| 95-th percentile | 41181.828 |
| Maximum | 63770.428 |
| Range | 62648.554 |
| Interquartile range (IQR) | 11899.625 |
Descriptive statistics
| Standard deviation | 12110.011 |
|---|---|
| Coefficient of variation (CV) | 0.91255659 |
| Kurtosis | 1.6062987 |
| Mean | 13270.422 |
| Median Absolute Deviation (MAD) | 5018.7571 |
| Skewness | 1.5158797 |
| Sum | 17755825 |
| Variance | 1.4665237 × 108 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1639.5631 | 2 | 0.1% |
| 16884.924 | 1 | 0.1% |
| 29330.98315 | 1 | 0.1% |
| 2221.56445 | 1 | 0.1% |
| 19798.05455 | 1 | 0.1% |
| 13063.883 | 1 | 0.1% |
| 13555.0049 | 1 | 0.1% |
| 44202.6536 | 1 | 0.1% |
| 10422.91665 | 1 | 0.1% |
| 7243.8136 | 1 | 0.1% |
| Other values (1327) | 1327 |
| Value | Count | Frequency (%) |
| 1121.8739 | 1 | |
| 1131.5066 | 1 | |
| 1135.9407 | 1 | |
| 1136.3994 | 1 | |
| 1137.011 | 1 | |
| 1137.4697 | 1 | |
| 1141.4451 | 1 | |
| 1146.7966 | 1 | |
| 1149.3959 | 1 | |
| 1163.4627 | 1 |
| Value | Count | Frequency (%) |
| 63770.42801 | 1 | |
| 62592.87309 | 1 | |
| 60021.39897 | 1 | |
| 58571.07448 | 1 | |
| 55135.40209 | 1 | |
| 52590.82939 | 1 | |
| 51194.55914 | 1 | |
| 49577.6624 | 1 | |
| 48970.2476 | 1 | |
| 48885.13561 | 1 |
Interactions
Correlations
| age | bmi | charges | children | region | sex | smoker | |
|---|---|---|---|---|---|---|---|
| age | 1.000 | 0.108 | 0.534 | 0.057 | 0.000 | 0.000 | 0.043 |
| bmi | 0.108 | 1.000 | 0.119 | 0.016 | 0.164 | 0.000 | 0.000 |
| charges | 0.534 | 0.119 | 1.000 | 0.133 | 0.065 | 0.063 | 0.832 |
| children | 0.057 | 0.016 | 0.133 | 1.000 | 0.000 | 0.000 | 0.038 |
| region | 0.000 | 0.164 | 0.065 | 0.000 | 1.000 | 0.000 | 0.057 |
| sex | 0.000 | 0.000 | 0.063 | 0.000 | 0.000 | 1.000 | 0.069 |
| smoker | 0.043 | 0.000 | 0.832 | 0.038 | 0.057 | 0.069 | 1.000 |
Missing values
A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
Sample
| age | sex | bmi | children | smoker | region | charges | |
|---|---|---|---|---|---|---|---|
| 0 | 19 | female | 27.900 | 0 | yes | southwest | 16884.92400 |
| 1 | 18 | male | 33.770 | 1 | no | southeast | 1725.55230 |
| 2 | 28 | male | 33.000 | 3 | no | southeast | 4449.46200 |
| 3 | 33 | male | 22.705 | 0 | no | northwest | 21984.47061 |
| 4 | 32 | male | 28.880 | 0 | no | northwest | 3866.85520 |
| 5 | 31 | female | 25.740 | 0 | no | southeast | 3756.62160 |
| 6 | 46 | female | 33.440 | 1 | no | southeast | 8240.58960 |
| 7 | 37 | female | 27.740 | 3 | no | northwest | 7281.50560 |
| 8 | 37 | male | 29.830 | 2 | no | northeast | 6406.41070 |
| 9 | 60 | female | 25.840 | 0 | no | northwest | 28923.13692 |
| age | sex | bmi | children | smoker | region | charges | |
|---|---|---|---|---|---|---|---|
| 1328 | 23 | female | 24.225 | 2 | no | northeast | 22395.74424 |
| 1329 | 52 | male | 38.600 | 2 | no | southwest | 10325.20600 |
| 1330 | 57 | female | 25.740 | 2 | no | southeast | 12629.16560 |
| 1331 | 23 | female | 33.400 | 0 | no | southwest | 10795.93733 |
| 1332 | 52 | female | 44.700 | 3 | no | southwest | 11411.68500 |
| 1333 | 50 | male | 30.970 | 3 | no | northwest | 10600.54830 |
| 1334 | 18 | female | 31.920 | 0 | no | northeast | 2205.98080 |
| 1335 | 18 | female | 36.850 | 0 | no | southeast | 1629.83350 |
| 1336 | 21 | female | 25.800 | 0 | no | southwest | 2007.94500 |
| 1337 | 61 | female | 29.070 | 0 | yes | northwest | 29141.36030 |
Duplicate rows
Most frequently occurring
| age | sex | bmi | children | smoker | region | charges | # duplicates | |
|---|---|---|---|---|---|---|---|---|
| 0 | 19 | male | 30.59 | 0 | no | northwest | 1639.5631 | 2 |